Audio-based synchronization server

ABSTRACT

An audio channel of a time-based media presentation provides a basis for synchronizing to the presentation across a variety of platforms independent of when and where the presentation is being viewed. By pre-processing the media into a series of non-unique hashes, and similarly processing an audio stream of the media captured at a client device, a comparison can be made that yields an accurate time offset within the presentation. The comparison may usefully be performed over a data network using a server that hosts data from the pre-processed media, and a variety of applications may be deployed on the client device based on the resulting synchronization.

RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.12/789,377 filed May 27, 2010, which claims the benefit of U.S. Prov.App. No. 61/181,472, filed on May 27, 2009, the entire content of eachis hereby incorporated by reference.

BACKGROUND

Time-based media presentations such as movies, animation, sports events,live or pre-recorded television broadcasts, and so forth may bepresented in a variety of formats and a variety of venues that may rangefrom new movie releases in movie theaters to time-shifted home viewingof pre-recorded television broadcasts. There remains a need forsynchronization capabilities that permits individual devices tosynchronize to a time-based media presentation regardless of where andwhen the content is being displayed.

SUMMARY

An audio channel of a time-based media presentation provides a basis forsynchronizing to the presentation across a variety of platformsindependent of when and where the presentation is being viewed. Bypre-processing the media into a series of non-unique hashes, andsimilarly processing an audio stream of the media captured at a clientdevice, a comparison can be made that yields an accurate time offsetwithin the presentation. The comparison may usefully be performed over adata network using a server that hosts data from the pre-processedmedia, and a variety of applications may be deployed on the clientdevice based on the resulting synchronization.

In one aspect, a method disclosed herein includes receiving an audioportion of a time-based media presentation with a microphone of a clientdevice; sampling the audio on the client device to obtain a sequence ofdigital samples of the audio portion; processing the sequence of digitalsamples to provide a plurality of hashes, each one of the plurality ofhashes including a plurality of bits, each one of the plurality ofhashes providing a non-unique representation of a segment of the audioportion, and each one of the plurality of hashes having a known relativetime offset to each other one of the plurality of hashes; transmittingthe plurality of hashes and a unique identifier for the time-based mediapresentation to a server; receiving from the server a time offsetindicative of a current time offset within the time-based mediapresentation; and synchronizing an application on the client device tothe time-based media presentation based upon the time offset.

The client device may include at least one of a mobile device, acellular phone, a laptop computer, a notebook computer, and a netbook.The time-based media presentation may include one or more of a movie, asports event, and a television broadcast. Adjacent ones of the pluralityof hashes may be calculated from overlapping windows of the sequence ofdigital samples. Processing may include downsampling the sequence ofdigital samples to about five thousand five hundred Hertz. Processingmay include filtering the sequence of digital samples with a low passfilter to provide a filtered output and transforming the filtered outputwith a discrete Fourier transform. Processing may include normalizing amagnitude of the sequence of digital samples.

The method may include, on the server, determining an allowable biterror for the plurality of hashes; identifying for each one of theplurality of hashes a set of candidate hashes with a number of bitwisevariations from the one of the plurality of hashes no greater than theallowable bit error; locating any candidate time offsets in thetime-based media presentation corresponding to each set of candidatehashes for each one of the plurality of hashes; updating scores for thecandidate time offsets corresponding to each set of candidate hashes;selecting one of the candidate time offsets having a best one of thescores as the time offset within the time-based media presentation; andtransmitting the time offset to the client device.

Determining the allowable bit error may include receiving the allowablebit error from the client device. Each one of the plurality of hashesmay consist of thirty two bits and the allowable bit error may be eightbits. Identifying a set of candidate hashes may include providing abinary tree of all possible values for the hash and traversing thebinary tree in a manner that excludes branches for binary values thatexceed the allowable bit error for the hash. The method may includeconditionally transmitting the time offset only when the best one of thescores exceeds a predetermined threshold. The method may includetransmitting supplemental information to the server including a hashsequence number that identifies an order of the plurality of hashesrelative to one another. Synchronizing may include displaying anindicator on the client device that indicates a synchronization with thetime-based media presentation. The method may include renderingadditional content on the client device under control of theapplication, the additional content synchronized to the time-based mediapresentation. Rendering additional content may include rendering one ormore of a supplemental video stream, contextual information,advertising, and interactive content.

A computer program product for audio-based synchronization disclosedherein may include code that performs the steps of: receiving an audioportion of a time-based media presentation with a microphone of a clientdevice; sampling the audio on the client device to obtain a sequence ofdigital samples of the audio portion; processing the sequence of digitalsamples to provide a plurality of hashes, each one of the plurality ofhashes including a plurality of bits, each one of the plurality ofhashes providing a non-unique representation of a segment of the audioportion, and each one of the plurality of hashes having a known relativetime offset to each other one of the plurality of hashes; transmittingthe plurality of hashes and a unique identifier for the time-based mediapresentation to a server; receiving from the server a time offsetindicative of a current time offset within the time-based mediapresentation; and synchronizing an application on the client device tothe time-based media presentation based upon the time offset.

The client device may include at least one of a mobile device, acellular phone, a laptop computer, a notebook computer, and a netbook.The time-based media presentation may include one or more of a movie, asports event, and a television broadcast. Adjacent ones of the pluralityof hashes are calculated from overlapping windows of the sequence ofdigital samples. Processing may include downsampling the sequence ofdigital samples to about five thousand five hundred Hertz. Processingmay include filtering the sequence of digital samples with a low passfilter to provide a filtered output and transforming the filtered outputwith a discrete Fourier transform. Processing may include normalizing amagnitude of the sequence of digital samples.

The computer program product may further include code that whenexecuting on a server performs the steps of: determining an allowablebit error for the plurality of hashes; identifying for each one of theplurality of hashes a set of candidate hashes with a number of bitwisevariations from the one of the plurality of hashes no greater than theallowable bit error; locating any candidate time offsets in thetime-based media presentation corresponding to each set of candidatehashes for each one of the plurality of hashes; updating scores for thecandidate time offsets corresponding to each set of candidate hashes;selecting one of the candidate time offsets having a best one of thescores as the time offset within the time-based media presentation; andtransmitting the time offset to the client device.

The allowable bit error may be a variable bit error received from theclient device. Each one of the plurality of hashes may consists ofthirty two bits and the allowable bit error may be eight bits.Identifying the set of candidate hashes may include providing a binarytree of all possible values for the hash and traversing the binary treein a manner that excludes branches for binary values that exceed theallowable bit error for the hash. The computer program product mayinclude code that, when executing on the server, performs the step ofconditionally transmitting the time offset only when the best one of thescores exceeds a predetermined threshold. The computer program productmay include code that, when executing on one or more computers, performsthe step of transmitting supplemental information to the serverincluding a hash sequence number that identifies an order of theplurality of hashes relative to one another. Synchronizing may includedisplaying an indicator on the client device that indicates asynchronization with the time-based media presentation. The computerprogram product may include code that performs the step of renderingadditional content on the client device under control of theapplication, the additional content synchronized to the time-based mediapresentation. Rendering additional content may include rendering one ormore of a supplemental video stream, contextual information,advertising, and interactive content.

In another aspect, a device disclosed herein includes a microphone thatreceives an audio portion of a time-based media presentation andconverts the audio portion into electrical signals; an analog-to-digitalconverter coupled to the microphone that receives the electrical signalsand provides a sequence of digital samples of the audio portion; anetwork interface for communicating over a data network; a processorcoupled to the network interface and the analog-to-digital converter,the processor including processing circuitry configured to perform thesteps of processing the sequence of digital samples to provide aplurality of hashes, each one of the plurality of hashes including aplurality of bits, each one of the plurality of hashes providing anon-unique representation of a segment of the audio portion, and eachone of the plurality of hashes having a known relative time offset toeach other one of the plurality of hashes, and the processor furtherconfigured to transmit the plurality of hashes and a unique identifierfor the time-based media presentation to a server through the networkinterface and to receive from the server a time offset indicative of acurrent time offset within the time-based media presentation; and adisplay under control of the processor that renders an outputsynchronized to the time-based media presentation according to the timeoffset.

The microphone, the analog-to-digital converter, the processor, thenetwork interface, and the display may be integrated into a housing forat least one of a mobile device, a cellular phone, a laptop computer, anotebook computer, and a netbook. The time-based media presentation mayinclude one or more of a movie, a sports event, and a televisionbroadcast. Adjacent ones of the plurality of hashes may be calculatedfrom overlapping windows of the sequence of digital samples. Processingmay include downsampling the sequence of digital samples to about fivethousand five hundred Hertz. Processing may include filtering thesequence of digital samples with a low pass filter to provide a filteredoutput and transforming the filtered output with a discrete Fouriertransform. Processing may include normalizing a magnitude of thesequence of digital samples.

The server may be configured to perform the steps of: determining anallowable bit error for the plurality of hashes; identifying for eachone of the plurality of hashes a set of candidate hashes with a numberof bitwise variations from the one of the plurality of hashes no greaterthan the allowable bit error; locating any candidate time offsets in thetime-based media presentation corresponding to each set of candidatehashes for each one of the plurality of hashes; updating scores for thecandidate time offsets corresponding to each set of candidate hashes;selecting one of the candidate time offsets having a best one of thescores as the time offset within the time-based media presentation; andrespond to the network interface with the time offset.

The allowable bit error may be a variable bit error received by theserver from the network interface. Each one of the plurality of hashesmay consist of thirty two bits and the allowable bit error may be eightbits. Identifying a set of candidate hashes may include providing abinary tree of all possible values for the hash and traversing thebinary tree in a manner that excludes branches for binary values thatexceed the allowable bit error for the hash. Transmitting the timeoffset may include conditionally transmitting the time offset only whenthe best one of the scores exceeds a predetermined threshold. Theprocessor may be further configured to transmit supplemental informationto the server including a hash sequence number that identifies an orderof the plurality of hashes relative to one another. Synchronizing mayinclude displaying an indicator on the client device that indicates asynchronization with the time-based media presentation. The processormay be further configured to render additional content on the displayunder control of the processor, the additional content synchronized tothe time-based media presentation. The additional content may includeone or more of a supplemental video stream, contextual information,advertising, and interactive content.

In another aspect, a method disclosed herein includes receiving an audioportion of a time-based media presentation with a microphone of a clientdevice as a plurality of digital samples; processing the plurality ofdigital samples to obtain a plurality of hashes, each one of theplurality of hashes non-uniquely corresponding to a portion of thetime-based media presentation, and each one of the plurality of hasheshaving a relative time offset to each other one of the plurality ofhashes; and analyzing the plurality of hashes to determine a time offsetwithin the time-based media presentation most closely corresponding tothe plurality of hashes.

Analyzing the plurality of hashes may include transmitting the pluralityof hashes to a server over a data network. The method may includereceiving from the server the time offset within the time-based mediapresentation. The method may include receiving from the server one ormore candidate offsets for each one of the plurality of hashes andlocally processing the one or more candidate offsets to determine thetime offset within the time-based media presentation. The method mayinclude transmitting supplemental information to the server including anidentifier that uniquely identifies the time-based media presentation.The method may include transmitting supplemental information to theserver including a hash sequence number that identifies an order of eachone of the plurality of hashes relative to one or more other ones of theplurality of hashes. Processing the plurality of digital samples mayinclude processing more than one of the plurality of digital samples tocreate each one of the plurality of hashes. Processing the plurality ofdigital samples may include processing more than one of the plurality ofdigital samples to provide a calculated hash and determining a number ofbitwise variations to the calculated hash to provide two or more of theplurality of hashes.

The number of bitwise variations to the calculated hash may include eachvalue with no more bitwise variations from the calculated hash than anallowable error rate. The client device may include at least one of acomputer, a cellular phone, and a portable digital device. Thetime-based media presentation may include a live television broadcast.The time-based media presentation may include a time-shifted replay of atelevision broadcast. The time-based media presentation may includepre-recorded media from one or more of a CD, a DVD, a Blu-ray Disc, andan HDDVD. Processing the plurality of digital samples may includefiltering the plurality of digital samples with a low pass filter andtransforming the resulting data with a discrete Fourier transform.Processing the plurality of digital sample may include normalizing amagnitude of the plurality of digital samples. The method may includesynchronizing to the time-based media presentation and renderingsupplemental content on the client device that may be time-synchronizedto the time-based media presentation. Synchronizing may includesynchronizing an application executing on the client device. The methodmay include displaying an indication of synchronization status on theclient device. The method may include conditionally synchronizing theapplication to the time-based media presentation only when thetime-offset can be determined with a predetermined certainty. Thesupplemental content may include an advertisement. The supplementalcontent may include content retrieved from a remote site through a datanetwork.

In another aspect, a computer program product for synchronizing to mediadescribed herein includes computer executable code embodied on anon-transitory computer readable medium that, when executing on one ormore computing devices, performs the steps of: receiving an audioportion of a time-based media presentation with a microphone of a clientdevice as a plurality of digital samples; processing the plurality ofdigital samples to obtain a plurality of hashes, each one of theplurality of hashes non-uniquely corresponding to a portion of thetime-based media presentation, and each one of the plurality of hasheshaving a relative time offset to each other one of the plurality ofhashes; and analyzing the plurality of hashes to determine a time offsetwithin the time-based media presentation most closely corresponding tothe plurality of hashes.

In another aspect, a system disclosed herein includes: receiving meansfor receiving an audio portion of a time-based media presentation with amicrophone of a client device as a plurality of digital samples;processing means for processing the plurality of digital samples toobtain a plurality of hashes, each one of the plurality of hashesnon-uniquely corresponding to a portion of the time-based mediapresentation, and each one of the plurality of hashes having a relativetime offset to each other one of the plurality of hashes; and analyzingmeans for analyzing the plurality of hashes to determine a time offsetwithin the time-based media presentation most closely corresponding tothe plurality of hashes.

In another aspect, a device disclosed herein includes a microphone thatreceives an audio portion of a time-based media presentation andconverts the audio portion into electrical signals; an analog-to-digitalconverter coupled to the microphone that receives the electrical signalsand provides a sequence of digital samples of the audio portion; aprocessor coupled to the analog-to-digital converter, the processorincluding processing circuitry configured to perform the steps ofprocessing the sequence of digital samples to obtain a plurality ofhashes, each one of the plurality of hashes non-uniquely correspondingto a portion of the time-based media presentation, and each one of theplurality of hashes having a relative time offset to each other one ofthe plurality of hashes, and analyzing the plurality of hashes todetermine a time offset within the time-based media presentation mostclosely corresponding to the plurality of hashes, and to provide anoutput synchronized to the time offset within the time-based media; anda display that renders the output.

The device may include a data network interface, wherein the processingcircuitry may be further configured to transmit the plurality of hashesto a remote server through the data network interface and to receivethrough the data network interface from the server data indicative ofthe time offset.

In another aspect, a method disclosed herein includes: receiving atime-based media presentation that may include an audio portion;sampling the audio to obtain a sequence of digital samples of the audioportion; processing the sequence of digital samples to provide aplurality of hashes, each one of the plurality of hashes non-uniquelycorresponding to one or more time offsets within the time-based mediapresentation; storing the plurality of hashes and the one or more timeoffsets in a hash table on a server; and configuring the server torespond to a request that contains a second plurality of hashes, eachone of the second plurality of hashes having a predetermined relativeoffset to each other one of the second plurality of hashes, byretrieving from the hash table a plurality of candidate offsets withinthe time-based media presentation corresponding to the second pluralityof hashes.

The method may include transmitting the plurality of candidate offsetsfrom the server to a client device that initiated the request. Themethod may include resolving the plurality of candidate offsets into anoffset within the time-based media presentation that most closelycorresponds to the plurality of candidate offsets on the client device.The method may include resolving the plurality of candidate offsets intoan offset within the time-based media presentation that most closelycorresponds to the plurality of candidate offsets and transmitting theoffset to the client device that initiated the request. The method mayinclude calculating on the server a plurality of bitwise variations toeach one of the second plurality of hashes, thereby providing a thirdplurality of hashes representative of the second plurality of hashes anda number of bit errors therein, and retrieving any time offsetscorresponding to each one of the third plurality of hashes as theplurality of candidate offsets. The second plurality of hashes receivedby the server may include a number of bitwise variations toclient-calculated hashes, wherein the number of bitwise variations arerepresentative of potential bit errors in the client-calculated hashes.The method may include storing on the server a plurality of hash tablesfor each one of a plurality of time-based media presentations. Themethod may include receiving from a client device that initiated therequest a unique identification of the one of the plurality oftime-based media presentations from which the second plurality of hasheswas obtained. The plurality of time-based media presentations includetelevision broadcasts. The plurality of time-based media presentationsinclude pre-recorded media distributed on one or more of a CD, a DVD, aBlu-ray Disc, and an HDDVD. The second plurality of hashes may beobtained from a time shifted viewing of one of the plurality oftime-based media presentations. Processing the sequence of digitalsamples to provide a plurality of hashes may include low pass filteringthe sequence of digital samples. Processing the sequence of digitalsamples to provide a plurality of hashes may include normalizing amagnitude of the sequence of digital samples. Processing the sequence ofdigital samples may include windowing the sequence of digital samples toprovide a series of overlapping sets of digital samples from thesequence of digital samples. Processing the sequence of digital samplesmay include transforming each one of the overlapping sets of digitalsamples into a frequency-domain representation. Processing the sequenceof digital samples may include dividing the frequency-domainrepresentation into a plurality of frequency bands and converting eachone of the plurality of frequency bands into a binary value according toa relative power of the one of the plurality of frequency bands to theother ones of the plurality of frequency bands within thefrequency-domain representation. The binary value may consist of a oneor a zero.

In another aspect, a system disclosed herein includes receiving meansfor receiving a time-based media presentation that may include an audioportion; sampling means for sampling the audio to obtain a sequence ofdigital samples of the audio portion; processing means for processingthe sequence of digital samples to provide a plurality of hashes, eachone of the plurality of hashes non-uniquely corresponding to one or moretime offsets within the time-based media presentation; storing means forstoring the plurality of hashes and the one or more time offsets in ahash table; and server means for responding to a request that contains asecond plurality of hashes, each one of the second plurality of hasheshaving a predetermined relative offset to each other one of the secondplurality of hashes, by retrieving from the hash table a plurality ofcandidate offsets within the time-based media presentation correspondingto the second plurality of hashes.

In another aspect, a computer program product for audio-basedsynchronization disclosed herein includes computer executable codeembodied on a non-transitory computer readable medium that, whenexecuting on one or more computing devices, performs the steps of:receiving a time-based media presentation that may include an audioportion; sampling the audio to obtain a sequence of digital samples ofthe audio portion; processing the sequence of digital samples to providea plurality of hashes, each one of the plurality of hashes non-uniquelycorresponding to one or more time offsets within the time-based mediapresentation; storing the plurality of hashes and the one or more timeoffsets in a hash table on a server; and configuring the server torespond to a request that contains a second plurality of hashes, eachone of the second plurality of hashes having a predetermined relativeoffset to each other one of the second plurality of hashes, byretrieving from the hash table a plurality of candidate offsets withinthe media-based presentation corresponding to the second plurality ofhashes.

In another aspect, a device disclosed herein includes a database thatstores a hash table, hash table containing a plurality of hashes, eachone of the plurality of hashes processed from a sequence of digitalsamples in an audio portion of a time-based media presentation, whereineach one of the plurality of hashes non-uniquely corresponding to one ormore time offsets within the time-based media presentation; and a servercoupled in a communicating relationship with the database and a datanetwork, the server configured to respond to a request that contains asecond plurality of hashes, each one of the second plurality of hasheshaving a predetermined relative offset to each other one of the secondplurality of hashes, by retrieving from the hash table a plurality ofcandidate offsets within the time-based media presentation thatcorresponding to the second plurality of hashes.

In another aspect, a method disclosed herein includes: transmitting abroadcast of a time-based media presentation; receiving audiencefeedback relating to the time-based media presentation over a datanetwork during the broadcast thereby providing live audience feedback;synchronizing at least one client device to a time-shifted view of thetime-based media presentation; receiving additional client feedback fromthe client device synchronously with the time-shifted view; andcombining the additional client feedback with the live audience feedbackaccording to a time offset within the time-based media presentation,thereby providing feedback data that aggregates audience feedbacksynchronized to both of a live version of the time-based mediapresentation and the time-shifted view of the time-based mediapresentation.

The time-based media presentation may be a sports event. The time-basedmedia presentation may be a live television broadcast. The time-basedmedia presentation may be a pre-recorded television broadcast.Synchronizing may include synchronizing based upon audio content withinthe time-based media presentation. Synchronizing may include receiving aplurality of hashes of the audio content from the client device andresolving the time offset within the time-shifted view based upon theplurality of hashes. The client device may include one or more of alaptop computer, a notebook computer, a mobile device, and a cellularphone.

In another aspect, a computer program product for tracking audienceparticipation described herein includes computer executable codeembodied in a non-transitory computer readable medium that, whenexecuting on one or more computing devices, performs the steps of:transmitting a broadcast of a time-based media presentation; receivingaudience feedback relating to the time-based media presentation over adata network during the broadcast thereby providing live audiencefeedback; synchronizing at least one client device to a time-shiftedview of the time-based media presentation; receiving additional clientfeedback from the client device synchronously with the time-shiftedview; and combining the additional client feedback with the liveaudience feedback according to a time offset within the time-based mediapresentation, thereby providing feedback data that aggregates audiencefeedback synchronized to both of a live version of the time-based mediapresentation and the time-shifted view of the time-based mediapresentation.

The time-based media presentation may be a sports event. The time-basedmedia presentation may be a live television broadcast. The time-basedmedia presentation may be a pre-recorded television broadcast.Synchronizing may include synchronizing based upon audio content withinthe time-based media presentation. Synchronizing may include receiving aplurality of hashes of the audio content from the client device andresolving the time offset within the time-shifted view based upon theplurality of hashes. The client device may include one or more of alaptop computer, a notebook computer, a mobile device, and a cellularphone.

In another aspect, a device disclosed herein includes: an interface to adata network; a database; and a processor configured to receive andaudience feedback over a data network relating to a live televisionbroadcast, and to store the audience feedback in the database as liveaudience feedback; the processor further configured to synchronize atleast one client device to a time-shifted view of the live televisionbroadcast based upon audio content within the time-shifted view of thelive television broadcast, and to receive additional client feedbackfrom the at least one client device over the data network synchronouslywith the time-shifted view; and to combine the additional clientfeedback with the live audience feedback according to a time offsetwithin the time-based media presentation, thereby providing feedbackdata that aggregates audience feedback synchronized to both the livetelevision broadcast and a time-shifted view of the live televisionbroadcast.

The processor may be further configured to transmit supplement contentto the at least one client device that may be synchronized to thetime-shifted view. The data network may include the Internet. Theinterface to the data network may include a web server. The processormay be configured to synchronize the at least one client device to thetime-shifted view based upon a plurality of hashes created by the clientdevice based upon the audio content and transmitted to the processorover the data network. The audience feedback may include responses toexplicit audience questions.

In another aspect, a method disclosed herein includes: receiving aplurality of hashes of audio content over a data network from aplurality of client devices exposed to a television broadcast; andidentifying an occurrence of a commercial break in the televisionbroadcast based upon variations in concurrent ones of the plurality ofhashes received from different ones of the client devices.

The method may include identifying a channel change in proximity to oneof the plurality of client devices based upon a variation in the ones ofthe plurality of hashes received from the one of the plurality of clientdevices and other ones of the plurality of hashes received concurrentlyfrom other ones of the plurality of client devices. The method mayinclude inferring a geographic proximity among two or more of theplurality of client devices based upon a similarity of concurrent onesof the plurality of hashes received from the two or more of theplurality of client devices during the commercial break. The method mayinclude determining whether a local advertisement or a networkadvertisement may be being aired during the commercial break based uponvariations among the plurality of hashes received from different ones ofthe plurality of client devices. The plurality of client devices mayinclude one or more of a laptop computer, a notebook computer, a netbookcomputer, a cellular phone, and a personal digital device. Each one ofthe plurality of hashes may include a processed representation ofdigital samples of the audio content captured by each one of theplurality of client devices.

In another aspect, a computer program product disclosed herein includescomputer executable code that, when executing on one or more computingdevices, performs the steps of: receiving a plurality of hashes of audiocontent over a data network from a plurality of client devices exposedto a television broadcast; and identifying an occurrence of a commercialbreak in the television broadcast based upon variations in concurrentones of the plurality of hashes received from different ones of theclient devices.

The computer program product may include code that performs the step ofidentifying a channel change in proximity to one of the plurality ofclient devices based upon a variation in the ones of the plurality ofhashes received from the one of the plurality of client devices andother ones of the plurality of hashes received concurrently from otherones of the plurality of client devices. The computer program productmay include code that performs the step of inferring a geographicproximity among two or more of the plurality of client devices basedupon a similarity of concurrent ones of the plurality of hashes receivedfrom the two or more of the plurality of client devices during thecommercial break. The computer program product may include code thatperforms the step of determining whether a local advertisement or anetwork advertisement may be being aired during the commercial breakbased upon variations among the plurality of hashes received fromdifferent ones of the plurality of client devices. The plurality ofclient devices include one or more of a laptop computer, a notebookcomputer, a netbook computer, a cellular phone, and a personal digitaldevice. Each one of the plurality of hashes may include a processedrepresentation of digital samples of the audio content captured by eachone of the plurality of client devices.

In one aspect, a method disclosed herein includes receiving a time-basedmedia presentation that includes an audio portion; sampling the audio toobtain a sequence of digital samples of the audio portion; processingthe sequence of digital samples to provide a plurality of hashes, eachone of the plurality of hashes non-uniquely corresponding to one or moretime offsets within the time-based media presentation; storing theplurality of hashes and the one or more time offsets in a hash table ona server; and configuring the server to respond to a request thatcontains a second plurality of hashes, each one of the second pluralityof hashes having a predetermined relative offset to each other one ofthe second plurality of hashes, by retrieving from the hash table aplurality of candidate offsets within the time-based media presentationcorresponding to the second plurality of hashes.

The method may include transmitting the plurality of candidate offsetsfrom the server to a client device that initiated the request. Themethod may include resolving the plurality of candidate offsets into anoffset within the time-based media presentation that most closelycorresponds to the plurality of candidate offsets on the client device.The method may include resolving the plurality of candidate offsets intoan offset within the time-based media presentation that most closelycorresponds to the plurality of candidate offsets and transmitting theoffset to the client device that initiated the request. The method mayinclude calculating on the server a plurality of bitwise variations toeach one of the second plurality of hashes, thereby providing a thirdplurality of hashes representative of the second plurality of hashes anda number of bit errors therein, and retrieving any time offsetscorresponding to each one of the third plurality of hashes as theplurality of candidate offsets. The second plurality of hashes receivedby the server may include a number of bitwise variations toclient-calculated hashes, wherein the number of bitwise variations maybe representative of potential bit errors in the client-calculatedhashes. The method may include storing on the server a plurality of hashtables for each one of a plurality of time-based media presentations.The method may include receiving from a client device that initiated therequest a unique identification of the one of the plurality oftime-based media presentations from which the second plurality of hasheswas obtained. The plurality of time-based media presentations mayinclude television broadcasts. The plurality of time-based mediapresentations may include pre-recorded media distributed on one or moreof a CD, a DVD, a Blu-ray Disc, and an HDDVD. The second plurality ofhashes may be obtained from a time shifted viewing of one of theplurality of time-based media presentations. Processing the sequence ofdigital samples to provide a plurality of hashes may include low passfiltering the sequence of digital samples. Processing the sequence ofdigital samples to provide a plurality of hashes may include normalizinga magnitude of the sequence of digital samples. Processing the sequenceof digital samples may include windowing the sequence of digital samplesto provide a series of overlapping sets of digital samples from thesequence of digital samples. Processing the sequence of digital samplesmay include transforming each one of the overlapping sets of digitalsamples into a frequency-domain representation. Processing the sequenceof digital samples may include dividing the frequency-domainrepresentation into a plurality of frequency bands and converting eachone of the plurality of frequency bands into a binary value according toa relative power of the one of the plurality of frequency bands to theother ones of the plurality of frequency bands within thefrequency-domain representation. The binary value may consist of a oneor a zero.

In another aspect, a computer program product disclosed herein foraudio-based synchronization includes computer executable code embodiedon a non-transitory computer readable medium that, when executing on oneor more computing devices, may perform the steps of: receiving atime-based media presentation that includes an audio portion; samplingthe audio to obtain a sequence of digital samples of the audio portion;processing the sequence of digital samples to provide a plurality ofhashes, each one of the plurality of hashes non-uniquely correspondingto one or more time offsets within the time-based media presentation;storing the plurality of hashes and the one or more time offsets in ahash table on a server; and responding to a request that contains asecond plurality of hashes, each one of the second plurality of hasheshaving a predetermined relative offset to each other one of the secondplurality of hashes, by retrieving from the hash table a plurality ofcandidate offsets within the media-based presentation corresponding tothe second plurality of hashes.

In another aspect, a device disclosed herein includes a database thatstores a hash table, hash table containing a plurality of hashes, eachone of the plurality of hashes processed from a sequence of digitalsamples in an audio portion of a time-based media presentation, whereineach one of the plurality of hashes non-uniquely corresponding to one ormore time offsets within the time-based media presentation; and a servercoupled in a communicating relationship with the database and a datanetwork, the server configured to respond to a request that contains asecond plurality of hashes, each one of the second plurality of hasheshaving a predetermined relative offset to each other one of the secondplurality of hashes, by retrieving from the hash table a plurality ofcandidate offsets within the time-based media presentation thatcorresponding to the second plurality of hashes.

DRAWINGS

The invention may be more fully understood with reference to theaccompanying drawings wherein:

FIG. 1 is a block diagram of a synchronization system.

FIG. 2 is a flow chart of a server-side process for synchronization.

FIG. 3 illustrates a technique for identifying bitwise variations to abinary value.

FIG. 4 is a flow chart of a client-side process for synchronization.

FIG. 5 is a block diagram of an audience tracking system.

FIG. 6 is a flow chart of an audience tracking process.

DETAILED DESCRIPTION

Disclosed herein are systems, methods, devices, computer code, and meansfor synchronizing to a time-based media presentation based upon an audiochannel of the time-based media presentation. It will be understood thatwhile an audio channel provides one useful source for synchronization,any channel such as a video, slide show, or concurrent data channel mayalso or instead be used for synchronization as described herein.

FIG. 1 is a block diagram of a synchronization system. The system 100may include a client device 102 with a display 104, a processor 106, amemory 108, an analog-to-digital converter 109, a microphone 110, and adata network interface 112. The system may further include a mediasource 114, a media platform 116 that emits an audio portion 118 of atime-based media presentation, a data network 120, a server 122including a data network interface 124 and a database 126, and datanetwork content sources 128.

The client device 102 may be any device with a housing having amicrophone 110, a data network interface 112, and other componentscollectively capable of performing the functions generally describedherein. By way of example and not of limitation, this may include alaptop computer, a notebook computer, a netbook computer, and a desktopcomputer. This may also or instead include a communication device suchas a cellular phone, electronic mail device, or the like. The clientdevice 102 may also or instead include a mobile device such as apersonal digital assistant, media player, smart phone, iPod, or thelike.

The display 104 may be a screen or the like for displaying graphicalinformation. By way of generality, the client device 102 may alsoprovide for any of a variety of outputs including text, pictures, video,sound, and so forth, and all such output devices, or any other outputdevices that can be controlled by the client device 102 to provideinformation (e.g., buzzers, light-emitting diodes, etc.) are intended tofall within the scope of the display 104 as that term is used herein.

The processor 106 may include a general purpose microprocessor, adigital signal processor, an application specific integrated circuit, orany other processing circuitry or combination of the foregoing thatcontrols operation of the client device 102 and the components thereof,as further programmed or otherwise configured to perform the additionalprocessing for synchronization as described herein. This may in generalinclude software executing on a general processing unit of the processor106, or a dedicated, special purpose processor or other processingcircuitry or hardware configured to perform the synchronizationfunctions described herein, or a chipset or the like controlled by theprocessor to perform the synchronization functions described herein. Allsuch variations that would be apparent to one of ordinary skill in theart are intended to fall within the scope of this disclosure.

The memory 108 may include any conventional memory for an electronicdevice suitable for storing digital samples from the microphone 110, andotherwise supporting synchronization functions as described herein.

The analog-to-digital converter 109 may be any combination of circuits,processors, chips, chipsets and the like suitable for capturing asequence of digital samples from an analog microphone signal receivedfrom the microphone 110. One common sampling rate consistent withCompact Disc quality audio is 44.1 kHz with 16 bit samples. However, itwill be understood that other rates a sample sizes are commonly employedin a variety of applications, and larger or smaller samples, at higheror lower sample rates may be provided by the analog-to-digital converterwithout departing from the scope of this disclosure.

The microphone 110 may be any microphone capable of converting audioenergy to electrical signals for use by the analog-to-digital converter109. This may for example include a microphone integrated into theclient device 102, or an external microphone connected to the clientdevice 102 through a jack or input plug, or some combination of these.It should also be appreciated that while specific hardware is described,this description is by way of an example of a common, commerciallyavailable architecture. But more generally, any combination ofcomponents suitable for converting audio energy into digital samples maybe suitably adapted to use with the client device 102 described herein.

The data network interface 112 may include any hardware for connectingthe client device 102 in a communicating relationship with a datanetwork such as the data network 120. This may for example include adata network interface card for wired Ethernet or other wiredconnectivity, or this may include a wireless data networking circuitsupporting standardized or proprietary data network communications.Common standards that may be usefully employed in the data networkinterface 112 of the client device 102 include Bluetooth, IEEE 802.11(e.g., WiFi), IEEE 802.16 (e.g., WiMax), and cellular or other wide areabroadband data standards, as well as combinations of the foregoing.

The media source 114 may be any source of a time-based mediapresentation. This may, for example, include a DVD, HDDVD, Blu-ray Disc,or other optical, magnetic, or electronic media having contentpre-recorded thereon, along with any computer, disc player, tape player,or other device used to provide an electronic version of thepre-recorded content. The media source 114 may also include a broadcastmedium such as analog or digital television broadcasts, cabletelevision, Internet television, and so forth. The media source 114 mayalso include a source of media for time-shifted viewing of a televisionbroadcast or the like such as a Digital Video Recorder, or other localor data networked archive of content for time-shifted viewing. This mayalso or instead include on-demand programming received through a cabledata network, a data network (e.g., the Internet) or the like. This mayalso or instead include streaming media from an Internet data source orthe like. While video multimedia such as movies, sports events,television broadcasts, and any other live or pre-recorded video and thelike is generally contemplated as time-based media, it will beappreciated that time-based media may more generally include any mediathat changes over time such as sound recordings, radio programs, music,slide shows, animations, animated graphics, video games, and so forth,any of which may be stored on a pre-recorded medium, received over adata network, received through a cable data network, received through anaired broadcast, or otherwise made available in a locally reproducibleform as a time-based media presentation.

The media platform 116 may be any device or combination of devices thatreceives a time-based media presentation from the media source andrenders the time-based media presentation for viewing. This may includewithout limitation a computer, cable set top box, satellite dish,stereo, television, and so forth, as well as combinations of theforegoing. Thus for example a consumer may install a satellite dish,authenticate a satellite decoder over a telephone land line, decodesatellite signals with a satellite decoder to provide a time-based mediapresentation in electronic form, and render the time-based mediapresentation using a television to render the video images and a stereoto render the audio portion 118.

The audio portion 118 of the time-based media presentation may bereproduced as sound energy in a viewing environment. The client device102 may in general capture the audio portion 118 using the microphone110 and analog-to-digital converter 109 to provide digital samples ofthe audio portion. These digital samples may be further processed by theclient device 102 and used in a synchronization process as described infurther detail below.

The data network 120 may include any data network such as, for example,the Internet, as well as any intermediate data networks or devicesbetween the client device 102 and the server 122, such as local areadata networks, Internet service providers, air interfaces to cellular ortelecommunications company infrastructures, and so forth, as well ascable, telephone, or satellite infrastructure adapted for datacommunications. All such variations that can provide end-to-end datacommunications between the client device 102 and the server 122 mayserve as the data network 120 described herein.

The server 122 may be any combination of hardware and software capableof responding to requests over the data network 120 from the clientdevice 102. The server 122 may, for example, include a web server thatresponds to HyperText Transfer Protocol requests, or any other standardor proprietary information server that supports sessions with clientdevices for exchange of information as more generally described hereinthrough a data network interface 124. The server 122 may also include adatabase 126, such as a relational database, lookup tables, files, andso forth, that stores information such as hash tables for pre-processedmedia, all as described in greater detail below. Any database capable ofinformation retrieval consistent with operation of the server 122 asdescribed herein may be used as the database 126 of the server 122.

Data network content sources 128 may be any sources of content connectedto the data network 120. As generally discussed below, once the clientdevice 102 is synchronized to a time-based media presentation, theclient device 102 may retrieve and render synchronized content, eitherfrom the server 122 that provides synchronization functions, or anyother data network content sources 128 such as web sites, advertisementservers, streaming media servers, e-commerce sites, or any other remotesite or resource. The additional content synchronized to the time-basedmedia presentation may, for example, include a supplemental videostream, contextual information, advertising, interactive content, andany other content that might be related to the time-based mediapresentation, and more specifically, to a particular time offset withinthe time-based media presentation. In general, the synchronized contentmay be retrieved on an as-needed basis during a presentation, orpre-cached for some or all of the presentation so that it is locallypresent in the memory 104 of the client device 102 at the appropriatetime.

FIG. 2 is a flow chart of a server-side process for synchronization. Ingeneral, the process 200 may include pre-processing 201 of media tostore hash tables or the like in a database 202, and responding toclient requests for synchronization 203 based upon the hash tables forthe pre-processed media, all as more specifically described below.

As shown in step 202, the process 200 may begin by receiving an audioportion of a time-based media presentation such as any of the media fromany of the media sources described above.

As shown in step 204, the audio may be sampled into a sequence ofdigital samples from the audio portion. This may include digitizing anaudio rendering of the audio portion, or where the media is available indigital format, simply copying the digital audio, or a subset of thedigital audio to provide a sequence of digital samples for furtherprocessing.

As shown in step 208, a plurality of hashes may be calculated from thesequence of digital samples of the time-based media presentation. Ingeneral, the plurality of hashes may be a time wise sequence of hashescorresponding to digital samples of audio from the time-based mediapresentation. Each one of the plurality of hashes may be a non-uniquerepresentation of a portion of audio from the time-based mediapresentation corresponding to a particular time offset within thetime-based media presentation.

A variety of hashing functions are known in the art and may be adaptedto the audio-based synchronization systems described herein. One suchhashing function is described in Ke et al., Computer Visions for MusicIdentification, the entire content of which is incorporated herein byreference. While Ke proposes a hashing function for us in musicidentification, the hashing algorithms of Ke can be adapted tosynchronization as generally described herein. In one embodiment, auseful hashing function may include processing as described in greaterdetail below.

As an initial step, the amount of data from digital samples obtained atthe native sampling rate may be reduced by selecting a subset of thedigital samples at some predetermined frequency, e.g. every othersample, every third sample, and so forth. The digital samples may alsoor instead be downsampled to a predetermined frequency such as aboutfive thousand five hundred Hertz (5.5 kHz) so that hashing can beperformed consistently across multiple audio receiver types. The digitalsamples may also or instead be windowed to provide a sequence ofoverlapping, windowed data sets. In one embodiment, each one of thesequence of data sets may be obtained from a window of 1024 samples,with each window offset by 64 samples, thus providing a high degree ofoverlap for each windowed data set. More generally, any offset and/orwindow set consistent with the synchronization processes describedherein may be employed.

Each windowed data set (or sequence) of digital samples may also orinstead be process by normalizing a magnitude of the sequence of digitalsamples to some predetermined value. This step helps to mitigatedifferences in playback volume of a presentation, sensitivity of audioreceiving hardware, distance from the media platform (or speakers of themedia platform), room size, and other environmental conditions thatmight affect the sound captured by the client device. Each sequence ofdigital samples may also or instead be band pass filtered or low passfiltered, which may include filtering with a low pass filter to providea filtered output. This may include the use of a digital filter having a3 dB cutoff of 2.2 kHz, or about two kilohertz, or any other suitabledigital and/or analog filter to reduce noise and suppress signalcomponents outside the range of interest.

However processed, each sequence of digital samples may be transformedinto a frequency-domain representation using, e.g., a discrete Fouriertransform or other suitable algorithm. The frequency-domainrepresentation may then be hashed by dividing the frequency spectruminto a number of frequency bands and converting the signal energy ineach band into a binary value according to the relative power in eachband compared to each other one of the frequency bands within thefrequency-domain representation. In one aspect, the spectrum may bedivided into thirty two bands, with each band represented by a singlebit (e.g., a one or a zero) to provide a thirty two bit hash of thesequence of digital samples. The spectrum may be divided in a number ofways, such as linearly into equal size bands or logarithmically intobands of logarithmically increasing bandwidth. The resulting hash, whichprovides a compact non-unique description of the sampled audio, may thenbe accumulated with additional hashes for further processing.

As shown in step 210, the sequence of hashes may be stored, along withthe corresponding one or more time offsets in a hash table that permitsretrieval of the one or more time offsets with a hash value. The hashtable may, for example, be stored in a database on a server configuredto respond to a request from a client device.

The above pre-processing 201 may be performed any number of times forany number of time-based media presentations, with hash tables for eachmedia item stored in the database 202 for subsequent synchronizationprocesses. Turning now to the synchronization process 203, the followingsteps detail the manner in which a server responds to client requests.In general, the server may be configured to respond to a request from aclient device containing a number of hashes (and explicit or implicitsequence numbers for the hashes) with a number of candidate time offsetscorresponding to each one of the hashes. In general, the candidatehashes may be resolved into an offset within the time-based mediapresentation by the server, or forwarded to the client for furtherprocessing. By performing this additional processing at the server, theclient is relieved of further synchronization calculations and theoffset can be advantageously transmitted over a data network as a singlenumerical value.

As shown in step 212, a server may receive a number of hashes from aclient device. These hashes generally include hashes calculated at theclient device based upon audio data acquired by the client device. Theserver may also receive supplemental information to assist in asynchronization process, such as explicit sequence numbers for each hashand/or a unique identifier of the time-based media presentation thatexplicitly identifies the presentation to the server. While the systemsand methods described herein may be employed without such an identifier,this information can greatly simplify and speed synchronizationcalculations by reducing the data set against which the server mustsearch for candidate time offsets.

As shown in step 214, a number of bitwise variations to each receivedhash may be identified. In general, this includes determining anallowable bit error for the hash, or a number of allowable bitwisevariations that are to be evaluated in subsequent synchronizationprocessing, which value may for example be stored in the memory of theclient device and transmitted to the server. Finding the bitwisevariations to the hash may also be described as determining all valueswithin a specified Hamming distance of the calculated hash, whichprovides a certain allowance for variations between the ideal sourceaudio (used for pre-processing as described above) and the audio portionof a presentation as captured and digitized by a client device. With apredetermined allowable bit error, all of the binary values within thatnumber of bits of the hash may readily be determined using any suitabletechnique. One useful technique is described in greater detail belowwith reference to FIG. 3. Other techniques are known in the art and maybe useful employed to calculate bitwise variations to a hash asdescribed herein. In one embodiment, the hash may include thirty twobits, and the allowable bit error may be eight bits. The resultingcandidate hashes provide a basis for further synchronization processingthat accommodates variations in the audio as captured by the clientdevice.

It will be understood that while calculation of candidate hashes isdescribed above as a server-side function, the candidate hashes may alsoor instead be calculated by a client with suitable processing capabilityand communication bandwidth without impairing general operation of asynchronization process as described herein.

As shown in step 216 the candidate hashes may be evaluated to determinean actual offset within a time-based media presentation. For eachcandidate hash (which has a relative offset to other candidate hashes),any corresponding time offsets are retrieved from the hash table and acount or score is incremented for each one of the corresponding timeoffsets. A score or count is accumulated for each time offset retrievedfrom the hash table, with the scoring for each time offset shiftedaccording to the sequence number (or time) of the correspondingcandidate hash. In this manner, an offset within the time-based mediamost closely corresponding to a beginning of the hashes received fromthe client can be identified.

By way of simplified, illustrative example, the first client hash mayproduce two candidate hashes, and the two candidate hashes may yieldthree offsets at t=5, t 32 6, and t=10. The second client hash mayproduce two candidate hashes that yield from the hash table four offsetsat t=6, t=10, t=14, and t=15. However, this second group of offsets mustbe shifted back one time increment to align with the previous group, sothe second group would be used to accumulate a score at t=6−1=5,t=10−1=9, t=14−1=13, and t=15−1=14. Using a simple count, theaccumulated scores would then be 2 at t=5, 1 at t=6, 1 at t=9, 1 att=10, 1 at t=13, and 1 at t=14. A third client has may produce twocandidate hashes that yield a single offset at t=14. Again, this thirdgroup must be shifted back (two time increments) to align with theprevious groups, so the third group would accumulate a score att=14−2=12. At this point the best score occurs at t=5, and an inferencemay be drawn that the time at which the first hash was calculated at theclient device corresponds to an offset of t=5 within the time-basedmedia presentation. It will be readily appreciated that for a preferredembodiment using a thirty two bit hash and a Hamming distance of eight,a significantly greater number of time offsets will actually beproduced. However, the same basic approach may be employed to accumulateor otherwise score potential offsets within the media based upon timeoffsets retrieved from the hash table for candidate hashes.

As shown in step 218, the best score from among the plurality of scoresmay be used to select and return to the client an offset within thetime-based media presentation corresponding to the beginning of thesequence of hashes sent by the client device. It will be understood thatthe offset returned to the client may also or instead include the timecorresponding to the last of the sequence of hashes, or some otheroffset such as a median offset or an offset adjusted for networklatency. It should also be understood that the server may onlyconditionally return an offset, such as when the best score reaches somepredetermined minimum, or when a score for one offset is greater thanall other scores by some predetermined relative or absolute amount, orbased upon any other criteria that might be used to evaluate the qualityof the score(s) and/or the inferences drawn therefrom. In one practicalimplementation with scoring weighted according to the number of bits ineach hash (e.g.,, a score of thirty two for each retrieved time offset),useful criteria for a reliable synchronization include a minimum scoreof five thousand and a score of at least twice the next greatest score.Of course, other combinations of criteria may also or instead be used todetermine whether and when to return an offset to a client device.

FIG. 3 illustrates a technique for identifying bitwise variations to abinary value. As described above, a synchronization process may includea step of identifying candidate hashes corresponding to bitwisevariations in a hash value calculated by a client or, as alternativelystated, determining a number of bitwise variations to a calculated hash.As described below, these candidate hashes may be determined using abinary tree or binomial tree that is traversed in a manner that excludesbranches of the tree for binary values that exceed the allowable biterror for, i.e., Hamming distance from, the calculated hash.

In order to efficiently locate hash values that differ by a certainnumber of bits from a calculated hash, the server may create a binomialtree data structure 300 to hold loaded hash values. In a thirty two bitembodiment, the data structure 300 has thirty two levels with one levelfor each bit position in the hash. Each level includes left and rightbranches corresponding to zeroes and ones in a bit position of the hashvalue. In the simplified, illustrative embodiment of FIG. 3, the datastructure 300 stores a three bit hash value. Starting at the top of thetree, a binary value of 101 would follow a path through the tree and beplaced into a corresponding bucket (labeled “101”) at the bottom of thedata structure 300. In order to find hash values varying by not morethan one bit, a search algorithm can traverse each leg of the tree asfar as possible without traversing a branch that has more than one bitdifference from the calculated hash (in this case resulting in terminalsat “001”, “100”, and “111”). The efficiency in this approach resultsfrom the ability to avoid traversing branches that would not result inhashes within the desired Hamming distance. While the data structure 300of FIG. 3 may appear simple, the processing gains are substantial for athirty two bit hash and up to eight bits of variation. In general, thecandidate hash values are not stored in the data structure 300. Rather,the candidate hash values are implied by the branch traversal that leadsto a bucket at the bottom of the tree, with each terminal bucketrepresenting a candidate hash, and containing zero or more positionindices or time offsets corresponding to the implied candidate hashvalue. Thus, traversing the data structure 300 according to the biterror limits leads directly and efficiently to the hash table resultsfor the calculated hash received from a client device. Thus in oneaspect determining bitwise variations (FIG. 2, step 214) and evaluatingcandidate hashes (FIG. 2, step 216) to find candidate offsets may becombined into a single processing step. Other techniques suitable foridentifying and evaluating candidate hashes will readily be appreciated,any of which may also or instead be adapted for use in thesynchronization systems and methods disclosed herein.

FIG. 4 is a flow chart of a client-side process for synchronization. Theprocess 400 may in general include processing received audio to generatea sequence of hashes, and then transmitting the hashes to a server forremote calculation of a time offset in a time-based media presentation,after which a client device, which may be any of the client devicesdescribed above, may render synchronized content.

As shown in step 404, a client device, which may be any of the clientdevices described above, may be set up for synchronization such as byinstalling an application on the client device that performssynchronization functions, and/or any applications that might usesynchronization to retrieve and/or display synchronized content. Thismay also or instead include establishing programming interfaces on theclient device between existing applications and a synchronizationapplication so that programs that are already installed (such as mediaplayers, web browsers, and so forth) can render synchronized content.

As shown in step 406, the client device may receive audio. This may, forexample, include receiving an audio portion of a time-based mediapresentation with a microphone of the client device.

As shown in step 408, the client device may sample the audio, such as byusing the analog-to-digital converter to provide a plurality of digitalsamples, and may receive at the processor a sequence of digital samplesobtained with a sampling rate that establishes a time-based relationshipamong the sequence of digital samples. In one aspect, the subsequenthashing steps may be performed on overlapping windows of digital audiodata, so that a next sequence of digital samples is obtained from anoverlapping window of the audio portion of the time-based mediapresentation. In this manner, the windowing provides a series ofoverlapping sets of digital samples from the raw sequence of digitalsamples. The sets of digital samples may be further processed, such asbe preserving only a subset of digital samples for processing, e.g.,every other sample, every third sample, every eighth sample, or anyother reduced data set consistent with proper functioning of subsequentsynchronization functions.

As shown in step 410, the digital samples, such as a sequence or set ofwindowed digital samples, may be processed into a hash including anumber of bits that non-uniquely corresponds to a portion of thetime-based media presentation (and a time offset of that portion withinthe presentation). Over numerous repetitions of the process, a number ofsequential hashes may be obtained for overlapping windows of digitalsamples. Each one of the hashes is derived from the content of acorresponding audio portion of the time-based media presentation, butdoes not uniquely identify the audio portion that it was derived from.That is, numerous segments of audio from the presentation may yield thesame hash. Each one of the hashes may also have a sequence number, or arelative time offset to each other one of the plurality of hashes. Theserelative time offsets are generally not absolute in terms of thepresentation, but may serve as an accurate indicator of the relativetiming of each window of digital samples from which a hash was obtained.More generally, hashes may be prepared in a complementary process to thehashing performed on the pre-processed media as described above. Moregenerally, any suitable processing to the digital samples may beperformed consistent with the processing performed on the pre-processedmedia so that matching and synchronization can be performed.

As shown in step 412, a sequence of hashes may be transmitted to aserver, along with any additional information such as a uniqueidentifier for the time-based media presentation from which the hasheswere derived and a sequence number for each one of the sequence ofhashes indicated a relative time offset among the hashes. The time-basedmedia presentation may be identified in a number of ways. For example, auser of the client device may manually identify the media-basedpresentation, or may provide descriptive information helpful inidentifying the media such as a title of a television series,biographical data (actors, content, etc.), a time, date, and/or channelon which the media was broadcast, or any other useful information. Inanother aspect, the media may be identified using remote contentanalysis, such as by streaming audio or video samples directly to aremote server. While this process may be relatively bandwidth and/orcomputationally expensive, it may be performed one time prior to asynchronization, after which the more efficient synchronizationtechniques described herein may be employed to determine an offsetwithin the time-based media presentation.

As shown in step 414, the client device may determine whether an offsethas been received from the server. If an offset has been received fromthe server indicative of a time offset within the time-based mediapresentation, the process 400 may proceed to step 416 where the clientdevice synchronizes based on the offset. If any offset has not beenreceived, the process 400 may return to step 406 and the client devicemay receive, sample, and hash additional audio content for forwarding tothe server. The server may also or instead respond with an explicitindication of a failure to determine the offset. Where an offset isreturned, the offset may be provided as a specific offset within thetime-based media presentation as generally described above, or a numberof candidate offsets may be returned to the client device for localevaluation.

As shown in step 416, the client device may synchronize to thetime-based media presentation based upon the offset received from theserver, such as by storing in an application on the client device acurrent offset within the time-based media presentation. The localapplication may then coordinate synchronized activities on the clientdevice such as retrieving relevant content, launching additional mediaviewers, web browsers, interactive programs or applets, and so forth. Asynchronization indicator may be displayed on the client deviceindicating that a reliable synchronization has been achieved using,e.g., an icon or symbol on a display of the client device, or anotherindicator such as an audible tone, a flashing light-emitting diode, ananimation, and so forth. Once synchronization has been achieved, theclient device may autonomously maintain synchronization by assuminguninterrupted delivery of the time-based media presentation, and/or theclient device may continuously or periodically confirm synchronizationwith additional sequences of hashes transmitted to the server.

As shown in step 418, once the client device has synchronized to thetime-based media presentation, synchronized content may be rendered onthe client device. This may include any additional content such assupplemental streaming video, textual information, interactive content,advertisements, hyperlinks, and so forth. An application on the clientdevice that coordinates synchronization using the remote server may alsocontrol rendering of the additional content in a manner that issynchronized to the time-based media, either by directly rendering thecontent or by controlling one or more other applications on the clientdevice to render the content.

In addition, audience feedback concerning the time-based mediapresentation may be gathered from time-shifted views of the presentationand correlated to audience feedback from a live presentation. Thefeedback may, for example, be gathered explicitly with user inputs tothe client device, or implicitly such as by detecting a change ofchannel or termination of the presentation using, e.g., the audiencetracking techniques described below. Thus in one aspect there isdisclosed herein a technique for combination additional audience (orclient device) feedback from time-shifted viewing with live audiencefeedback to provide feedback data that aggregates audience feedbacksynchronized to both a liver version of the presentation and atime-shifted view of the presentation.

It will be understood that the steps of the above methods may be variedin sequence, repeated, modified, or deleted, or additional steps may beadded, all without departing from the scope of this disclosure. By wayof example various processing steps may be performed on the server, onthe client device, or some combination of these. In addition, a clientdevice may synchronize to multiple media sources at one time, and aserver may be configured to support synchronization of multiple clientsat one time. Thus the details of the foregoing will be understood asnon-limiting examples of the systems and methods of this disclosure.

FIG. 5 is a block diagram of an audience tracking system. In general,the system 500 may include a number of client devices 502 receivingaudio 504 from a media source 505 such as a television broadcast. Theclient devices 502 may process the audio 504 to derive a sequence ofhashes that are transmitted over a data network 506 to server 508 whereanalysis can be performed.

The client devices 502 may, for example, be any of the client devicesdescribed above. While four client devices 502 are depicted, any numberof client devices 502 may participate in the system 500, including anycombination of client devices 502 at one geographic location and/ornumerous geographic locations. Each client device 502 may receive theaudio 504 and create a sequence of hashes that characterize audiocontent within the audio 504. This may include any of the hashingprocesses described above, or any other hashing process that uniquely ornon-uniquely identifies the audio content.

The media source 505 may, for example, include televisions systems orstereo or other audio output systems rendering media such as a livetelevision broadcast. Where the client devices 502 are geographicallydistributed, the media source 505 may likewise include hardwarerendering the broadcast at a variety of locations including publiclocations such as airports, lounges, waiting rooms, and so forth, aswell as private locations such as homes or offices, as well as anycombination of these.

The data network 506 may include any of the data networks describedabove, and the server 508 may include any server or combination ofservers or the like capable of receiving sequences of hashes from clientdevices 502 and processing the sequences of hashes as described furtherbelow.

FIG. 6 is a flow chart of an audience tracking process. In general theprocess 600 includes hashing audio content at a number of client devicesand forwarding the resulting sequences of hashes to a server foranalysis.

As shown in step 602, the process 600 may begin by broadcasting mediahaving an audio component. The broadcast media may include televisedprogramming such as any live or pre-recorded television contentincluding a television series, a movie, a sports event, informationalprogramming, news, and so forth.

As shown in step 604, audio content from the broadcast media may bereceived by a number of client devices exposed to the broadcast media.

As shown in step 606, each client device may hash or otherwise processthe audio content into a time-based sequence of hashes that uniquely ornon-uniquely identify the audio content in the broadcast media at aparticular time.

As shown in step 608, each client device may transmit the sequence ofhashes to a server, such as any of the servers described above.

As shown in step 610, the server may receive the sequence of hashes fromeach participating client device, along with related information such asany explicit supplemental information provided by each client device, orinformation such as an IP address or the like for each client device,any of which may be usefully processed by the server to assist withsubsequent analysis.

As shown in step 612, the server may analyze the sequences of hashesreceived from the participating client devices. A variety of usefulinferences may be drawn from the resulting data set, includingmonitoring of audience behavior (such as channel changing) andadvertising characteristics as described below. It will be readilyappreciated that a range of additional statistics and conclusions mayalso or instead be extracted from the data set.

In one aspect, sequences of hashes from client devices exposed to abroadcast may be monitored in order to create descriptive signaturesdynamically. For example, as client devices receive a broadcast, theymay each create a sequence of hashes for the server. A general locationfor each client device may also be specified in advance by the clientdevice, or inferred from the content that is being broadcast or otherdata such as the IP addresses for the client devices. As theclient-generated signatures for a broadcast are received by the server,these submissions may be processed and an average or other compositesignature may be obtained. A variety of techniques for combining orotherwise characterizing such variations may be employed. Howeverderived, the composite signature may be stored and subsequently appliedto correlate new references to the broadcast program to a particulartime within the original broadcast. This may be useful, for example,when a viewer is watching a program on a time-shifted basis, such as tosynchronize supplemental content to the time-shifted view. In thismanner, the pre-processing described above may be omitted, and hashtables or the like for time-shifted synchronization may be createdautomatically from the sequences of hashes received from client devicesduring the live broadcast.

In another aspect, the sequences of hashes may be analyzed identify whenlocal commercials are being aired. When a program is on, the averagedaudio signals and the resulting sequences of hashes form client devicesmay remain within a narrow band based upon the underlying content.However, during commercial breaks, content may vary significantly basedupon the advertising that is broadcast by each local network. When thishappens, there may be a spike or other measurable change in signaturesthat varies according to the corresponding variation in advertisementcontent. This information may be usefully employed to infer a geographiclocation of client devices and for any other related purposes. Thisinformation may also or instead be used to distinguish betweenadvertisements and other broadcast content, which may be usefullyemployed, for example, to determine how to relate post-broadcastsignatures to the originally-broadcast content. Thus more generally,based upon server analysis of sequences of hashes, the process 600 mayinclude identifying an occurrence of a commercial break in thetelevision broadcast based upon variations in concurrent ones of theplurality of hashes received from different ones of the client devices.

In another aspect, the sequences of hashes may be analyzed to identifynetwork commercials. It has been observed that when commercials begin, acertain percentage of the public changes the channel. This will cause adeviation in the average audio signal band, but it will be the case thatthis deviation will occur to some extent in all localities. This patternin received, client-generated signatures may be used to infer anoccurrence of a commercial break. By extracting out the deviations andlooking at the averaged data of those who have chosen to stay on thecommercials, it will be possible to determine whether the commercialsbeing played are network-wide or are local.

Thus in one aspect, the process 600 may include identifying a channelchange in proximity to one of the client devices based upon a variationin the sequence of hashes received from the client device. In anotheraspect, the process 600 may include inferring a geographic proximityamong two or more of the client devices based upon a similarity inconcurrent ones of the hashes received from two or more the plurality ofdevices. In still another aspect, the process 600 may includedetermining whether a local advertisement or a network advertisement isbeing aired during a commercial break based upon variations among thehashes received from the various client devices.

Still more generally, by processing audio content from a broadcastdevice (such as a television or radio) on a client device andtransmitting characteristic information to a server, the server canderive a variety of useful metrics that describe the broadcast stream aswell as audience location, audience engagement in broadcast content, andso forth.

It will be appreciated that many of the above systems, devices, methods,processes, and the like may be realized in hardware, software, or anycombination of these suitable for the data processing, datacommunications, and other functions described herein. This includesrealization in one or more microprocessors, microcontrollers, embeddedmicrocontrollers, programmable digital signal processors or otherprogrammable devices or processing circuitry, along with internal and/orexternal memory. This may also, or instead, include one or moreapplication specific integrated circuits, programmable gate arrays,programmable array logic components, or any other device or devices thatmay be configured to process electronic signals. It will further beappreciated that a realization of the processes or devices describedabove may include computer-executable code created using a structuredprogramming language such as C, an object oriented programming languagesuch as C++, or any other high-level or low-level programming language(including assembly languages, hardware description languages, anddatabase programming languages and technologies) that may be stored,compiled or interpreted to run on one of the above devices, as well asheterogeneous combinations of processors, processor architectures, orcombinations of different hardware and software. At the same time,processing may be distributed across devices such as the various systemsdescribed above, or all of the functionality may be integrated into adedicated, standalone device. All such permutations and combinations areintended to fall within the scope of the present disclosure.

In other embodiments, disclosed herein are computer program productscomprising computer-executable code or computer-usable code that, whenexecuting on one or more computing devices (such as the devices/systemsdescribed above), performs any and/or all of the steps described above.The code may be stored in a computer memory or other non-transitorycomputer readable medium, which may be a memory from which the programexecutes (such as internal or external random access memory associatedwith a processor), a storage device such as a disk drive, flash memoryor any other optical, electromagnetic, magnetic, infrared or otherdevice or combination of devices. In another aspect, any of theprocesses described above may be embodied in any suitable transmissionor propagation medium carrying the computer-executable code describedabove and/or any inputs or outputs from same.

It will be appreciated that the methods and systems described above areset forth by way of example and not of limitation. Numerous variations,additions, omissions, and other modifications will be apparent to one ofordinary skill in the art. While particular embodiments of the presentinvention have been shown and described, it will be apparent to thoseskilled in the art that various changes and modifications in form anddetails may be made therein without departing from the spirit and scopeof the invention as defined by the following claims. The claims thatfollow are intended to include all such variations and modificationsthat might fall within their scope, and should be interpreted in thebroadest sense allowable by law.

1. A method comprising: receiving a time-based media presentation thatincludes an audio portion; sampling the audio to obtain a sequence ofdigital samples of the audio portion; processing the sequence of digitalsamples to provide a plurality of hashes, each one of the plurality ofhashes non-uniquely corresponding to one or more time offsets within thetime-based media presentation; storing the plurality of hashes and theone or more time offsets in a hash table on a server; and configuringthe server to respond to a request that contains a second plurality ofhashes, each one of the second plurality of hashes having apredetermined relative offset to each other one of the second pluralityof hashes, by retrieving from the hash table a plurality of candidateoffsets within the time-based media presentation corresponding to thesecond plurality of hashes.
 2. The method of claim 1 further comprisingtransmitting the plurality of candidate offsets from the server to aclient device that initiated the request.
 3. The method of claim 2further comprising resolving the plurality of candidate offsets into anoffset within the time-based media presentation that most closelycorresponds to the plurality of candidate offsets on the client device.4. The method of claim 1 further comprising resolving the plurality ofcandidate offsets into an offset within the time-based mediapresentation that most closely corresponds to the plurality of candidateoffsets and transmitting the offset to the client device that initiatedthe request.
 5. The method of claim 1 further comprising calculating onthe server a plurality of bitwise variations to each one of the secondplurality of hashes, thereby providing a third plurality of hashesrepresentative of the second plurality of hashes and a number of biterrors therein, and retrieving any time offsets corresponding to eachone of the third plurality of hashes as the plurality of candidateoffsets.
 6. The method of claim 1 wherein the second plurality of hashesreceived by the server includes a number of bitwise variations toclient-calculated hashes, wherein the number of bitwise variations arerepresentative of potential bit errors in the client-calculated hashes.7. The method of claim 1 further comprising storing on the server aplurality of hash tables for each one of a plurality of time-based mediapresentations.
 8. The method of claim 7 further comprising receivingfrom a client device that initiated the request a unique identificationof the one of the plurality of time-based media presentations from whichthe second plurality of hashes was obtained.
 9. The method of claim 7wherein the plurality of time-based media presentations includetelevision broadcasts.
 10. The method of claim 7 wherein the pluralityof time-based media presentations include pre-recorded media distributedon one or more of a CD, a DVD, a Blu-ray Disc, and an HDDVD.
 11. Themethod of claim 7 wherein the second plurality of hashes is obtainedfrom a time shifted viewing of one of the plurality of time-based mediapresentations.
 12. The method of claim 1 wherein processing the sequenceof digital samples to provide a plurality of hashes includes low passfiltering the sequence of digital samples.
 13. The method of claim 1wherein processing the sequence of digital samples to provide aplurality of hashes includes normalizing a magnitude of the sequence ofdigital samples.
 14. The method of claim 1 wherein processing thesequence of digital samples includes windowing the sequence of digitalsamples to provide a series of overlapping sets of digital samples fromthe sequence of digital samples.
 15. The method of claim 14 whereinprocessing the sequence of digital samples includes transforming eachone of the overlapping sets of digital samples into a frequency-domainrepresentation.
 16. The method of claim 15 wherein processing thesequence of digital samples includes dividing the frequency-domainrepresentation into a plurality of frequency bands and converting eachone of the plurality of frequency bands into a binary value according toa relative power of the one of the plurality of frequency bands to theother ones of the plurality of frequency bands within thefrequency-domain representation.
 17. The method of claim 16 wherein thebinary value consists of a one or a zero.
 18. A computer program productfor audio-based synchronization comprising computer executable codeembodied on a non-transitory computer readable medium that, whenexecuting on one or more computing devices, performs the steps of:receiving a time-based media presentation that includes an audioportion; sampling the audio to obtain a sequence of digital samples ofthe audio portion; processing the sequence of digital samples to providea plurality of hashes, each one of the plurality of hashes non-uniquelycorresponding to one or more time offsets within the time-based mediapresentation; storing the plurality of hashes and the one or more timeoffsets in a hash table on a server; and responding to a request thatcontains a second plurality of hashes, each one of the second pluralityof hashes having a predetermined relative offset to each other one ofthe second plurality of hashes, by retrieving from the hash table aplurality of candidate offsets within the media-based presentationcorresponding to the second plurality of hashes.
 19. A devicecomprising: a database that stores a hash table, hash table containing aplurality of hashes, each one of the plurality of hashes processed froma sequence of digital samples in an audio portion of a time-based mediapresentation, wherein each one of the plurality of hashes non-uniquelycorresponding to one or more time offsets within the time-based mediapresentation; and a server coupled in a communicating relationship withthe database and a data network, the server configured to respond to arequest that contains a second plurality of hashes, each one of thesecond plurality of hashes having a predetermined relative offset toeach other one of the second plurality of hashes, by retrieving from thehash table a plurality of candidate offsets within the time-based mediapresentation that corresponding to the second plurality of hashes.